Dynamic audio playback of soundtracks for electronic visual works

ABSTRACT

An electronic book is provided with a soundtrack, to which a reader can listen while reading the electronic book. Playback of the soundtrack is synchronized with the visual display of the electronic book. Audio cues are associated with different points in the text and these cues are dynamically played back in synchronization with the visual display of the electronic book based on the interaction of the user with the electronic book. The dynamic playback involves editing and playing an audio cue so that it has a duration that is based on a prediction of the duration of the portion of the electronic book with which the cue is synchronized. When the system starts playing an audio cue, it predicts when the next audio cue should start. The current cue is played for predicted duration and a transition to the next audio cue is initiated at an appropriate time.

CLAIM OF PRIORITY AND RELATED APPLICATION DATA

This application is a continuation application of, and claims priorityto, U.S. non-provisional patent application Ser. No. 12/943,917, filedNov. 10, 2010, entitled “Dynamic Audio Playback of Soundtracks forElectronic Visual Works,” the contents of which are hereby incorporatedby reference for all purposes as if fully set forth herein.

Note that U.S. non-provisional patent application Ser. No. 12/943,917claims priority to US. Provisional patent application Ser. No.61/259,995, filed on Nov. 10, 2009, the contents of which are herebyincorporated by reference for all purposes as if fully set forth herein.

BACKGROUND

Electronic books are a kind of multimedia work that is primarilycomprised of text, but also may include other visual media such asgraphics and images. While text in an electronic book may be accompaniedby other visual media, generally an electronic book is intended to beread from start to finish, although not necessarily in one sitting.

There are several file formats used for electronic books, including butnot limited to various types of markup language document types (e.g.,SGML, HTML, XML, LaTex and the like), and other data file types, such as.pdf files, plain text files, etc. Various file formats are used withelectronic book readers, such as the KINDLE reader from Amazon.com. Sucha book reader generally is a computer program designed to run on aplatform such as a personal computer, notebook computer, laptopcomputer, tablet computer, mobile device or dedicated hardware systemfor reading electronic books (such as the KINDLE reader).

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention are illustrated by way of example, and notby way of limitation, in the figures of the accompanying drawings and inwhich like reference numerals refer to similar elements and in which:

FIG. 1 is a dataflow diagram of an electronic book reader with a dynamicaudio player.

FIG. 2 is a dataflow diagram of more details of the dynamic audio playerof FIG. 1.

FIG. 3 is an illustration of a cue list.

FIG. 4 is an illustration of an audio cue file.

FIG. 5 is a flow chart of the setup process when an electronic book isopened.

FIG. 6 is a flow chart describing how an audio cue file is used tocreate audio data of a desired duration.

FIG. 7 is a flow chart describing how reading speed is calculated.

FIG. 8 is a data flow diagram describing how a soundtrack can beautomatically generated for an electronic book;

FIG. 9 is a block diagram that illustrates a computer system upon whichan embodiment of the invention may be implemented.

DETAILED DESCRIPTION OF THE INVENTION

Approaches for dynamic audio playback of soundtracks for electronicvisual works are presented herein. In the following description, for thepurposes of explanation, numerous specific details are set forth inorder to provide a thorough understanding of the embodiments of theinvention described herein. It will be apparent, however, that theembodiments of the invention described herein may be practiced withoutthese specific details. In other instances, well-known structures anddevices are shown in block diagram form or discussed at a high level inorder to avoid unnecessarily obscuring teachings of embodiments of theinvention.

Functional Overview

An electronic book is provided with a soundtrack, to which a reader canlisten while reading the electronic book. The purpose of the soundtrackis to accompany and enhance the reading experience, in which readersgenerally have images in their minds based on the story or other visualmedia that is part of the electronic book. Playback of the soundtrack issynchronized with the visual display of the electronic book.

Synchronizing playback of a soundtrack to the visual display of anelectronic book while the book is read by a reader is a challengingproblem. Different individuals read at different speeds, and differentindividuals will read at different speeds at different times. Theduration of the visual display of a portion of the electronic bookvaries in duration from reader to reader, and between different pointsin time. In other words, the duration of the visual display of theportion of an electronic book is variable, depending on the userinteraction with the electronic book. Yet the playback of thesoundtrack, a kind of time-dependent media, is synchronized with thisvisual display.

To provide a good reading experience with a soundtrack in a manner thatis applicable to multiple readers, audio cues are associated withdifferent points in the text and these cues are dynamically played backin synchronization with the visual display of the electronic book basedon the interaction of the user with the electronic book. The dynamicplayback involves editing and playing an audio cue so that it has aduration that is based on a prediction of the duration of the portion ofthe electronic book with which the cue is synchronized. When the systemstarts playing an audio cue, it predicts when the next audio cue shouldstart. The current cue is played for predicted duration and a transitionto the next audio cue is initiated at an appropriate time.

Such a soundtrack generally is not just any music or sound; some musicand sound could be distracting to the reader instead of enhancing thereading experience. Instead, the soundtrack includes music and sounddesigned to evoke emotions in the reader similar to those emotions thatwould be evoked by the text. Generally, a soundtrack for an electronicbook benefits when there are few bright transient sounds, no vocals, anda spare, somewhat hypnotic feel to the music. Genre-wise, music that istoo fast or too intense can be distracting and difficult to read to.

In its various aspects, the invention can be embodied in acomputer-implemented process, a machine (such as an electronic device,or a general purpose computer or other device that provides a platformon which computer programs can be executed), processes performed bythese machines, or an article of manufacture. Such articles can includea computer program product or digital information product in which acomputer readable storage medium containing computer programinstructions or computer readable data stored thereon, and processes andmachines that create and use these articles of manufacture.

Accordingly, in one aspect, dynamic playback of audio involves receivingdata about user interaction with a portion of an electronic visual work.A section of audio to be played back associated with the portion of theelectronic visual work is dynamically adjusted in length according tothe user interaction with the electronic visual work. In oneimplementation, the duration of the visual display of the portion of theelectronic visual work is estimated according to the received data aboutuser interaction with the portion of the electronic visual work. Asequence of sub-mixes of audio associated with the portion of theelectronic visual work is selected so as to provide audio elements thatwill match the estimated duration. This estimation can be done using ahistory of reading speeds.

In another aspect, a soundtrack is played in synchronization withdisplay of an electronic visual work. The electronic visual work isreceived into memory. Information associating portions of the electronicvisual work with tags also is received into memory. Portions of theelectronic visual work are displayed in response to user interaction.Audio files with tags are accessed. Audio files to be associated withportions of the electronic visual work are selected according to thetags associated with the portions of the electronic visual work. Dataabout user interaction with the portion of an electronic visual work isreceived and the duration of playback of audio associated with thatspecific portion of the electronic visual work is dynamically adjustedaccording to the user interaction.

In another aspect, a soundtrack for an electronic visual work isgenerated. The electronic visual work is received into memory. Theelectronic visual work is processed in the memory such that portions ofthe electronic visual work are marked with tags that will associate tospecific portions of tagged audio files. Audio files with theappropriate tags are then accessed, and the target audio files forportions of the electronic visual work are selected and associated tocreate and play back the resulting soundtrack. The electronic visualwork can includes text and the processing includes processing the text.The tags can include emotional descriptors.

In another aspect, a cue list includes for each portion of an electronicvisual work, an emotional descriptor, wherein the emotional descriptorscorrespond to emotional descriptors also associated with audio data.

In another aspect, an audio cue includes audio data for a plurality ofsubmixes of the musical work (called “stems”) that can be mixed toprovide audio data and information indicative of how the stems can berepeated and combined to create the final result heard by the reader.

In another aspect, distribution of a soundtrack and its associatedelectronic visual work is done in a manner that enables the electronicvisual work to be viewed in the same manner as if the soundtrack was notavailable. After a reader accesses an electronic visual work, a cue listis identified and read. As a background task, audio data is downloadedwhile a first cue in the soundtrack is played in synchronization withthe display of the electronic visual work.

Yet other aspects are set forth in the following detailed description,and are provided by the various combinations of these different aspectsof the invention.

Dynamic Audio Playback of Soundtracks for Electronic Visual Works

Soundtracks can be associated with any of a variety of electronic visualworks, including electronic books. The types of music or audio thatcould be used also likely would depend on the type of work. For example,for works of fiction, the soundtrack will be similar in purpose to amovie soundtrack, i.e., to support the story—creating suspense,underpinning a love interest, or reaching a big climax. For children'sbooks, the music may be similar to that used for cartoons, possiblyincluding more sound effects, such as for when a page is being turned.For textbooks, the soundtrack may include rhythms and tonalities knownto enhance knowledge retention, such as material at about 128 or 132beats per minute and using significant modal tonalities. Some booksdesigned to support meditation could have a soundtrack with sounds ofnature, ambient sparse music, instruments with soft tones, and the like.Travel books could have music and sounds that are native to thelocations being described. For magazines and newspapers, differentsections or articles could be provided with different soundtracks and/orwith different styles of music. Even reading different passes of thesame page could have different soundtracks. Advertisers also could havetheir audio themes played during reading of such works. In such cases,the soundtracks could be selected in a manner similar to how text basedadvertisements are selected to accompany other material.

In particular, referring now to FIG. 1, electronic content such as anelectronic book 110 is input to an electronic device such as anelectronic book reader 112, which provides a visual display of theelectronic book to an end user or reader. The electronic content mayalso comprise any external content, such as a web page or otherelectronic document; therefore, the term electronic book in the presentdisclosure may encompass other types of electronic content as well. Theelectronic device may also comprise any device capable of processingand/or displaying electronic content, such as a computer, tablet,smartphone, portable gaming platform or other device; therefore, theterm electronic book reader in the present disclosure may encompassother types of electronic devices as well. The electronic book 110 isone or more computer data files that contain at least text and are in afile format designed to enable a computer program to read, format anddisplay the text. There are various file formats for electronic books,including but not limited to various types of markup language documenttypes (e.g., SGML, HTML, XML, LaTex and the like), and other documenttypes, examples of which include, but are not limited to, EPUB,FictionBook, plucker, PalmDoc, zTxt, TCR, CHM, RTF, OEB, PDF,mobipocket, Calibre, Stanza, and plain-text. Some file formats areproprietary and are designed to be used with dedicated electronic bookreaders. The invention is not limited to any particular file format.

The electronic book reader 112 can be any computer program designed torun on a computer platform, such as described above in connection withFIG. 13, examples of which include, but are not limited to, a personalcomputer, tablet computer, mobile device or dedicated hardware systemfor reading electronic books and that receives and displays the contentsof the electronic book 110. There are a number of commercially orpublicly available electronic book readers, examples of which include,but are not limited to, the KINDLE reader from Amazon.com, the Nookreader from Barnes & Noble, the Stanza reader, and the FBReadersoftware, an open source project. However, the invention is not limitedto any particular electronic book reader.

The electronic book reader 112 also outputs data 114 indicative of theuser interaction with the electronic book reader 112, so that such datacan be used by a dynamic audio player 116. Commercially or publiclyavailable electronic book readers can be modified in accordance with thedescription herein to provide such outputs.

The data about the user interaction with the text can come in a varietyof forms. For example, an identifier of the book being read (such as anISBN, e-ISBN number or hash code), and the current position in the textcan be provided. Generally, the current position is tracked by theelectronic book reader as the current “page” or portion of theelectronic book that is being displayed. The electronic book reader canoutput this information when it changes. Other information that can beuseful, if provided by the electronic book reader 112, includes, but isnot limited to the word count for a current range of the document beingdisplay, an indication of when the user has exited the electronic bookreader application, and an indication of whether the reader has pausedreading or resumed reading after a pause.

The information and instructions exchanged between the electronic bookreader and the dynamic audio player can be implemented through anapplication programming interface (API), so that the dynamic audioplayer can request that the electronic book reader provide statusinformation, or perform some action, or so that the electronic bookreader can control the other application program. The dynamic audioplayer can be programmed to implement this API as well. An exampleimplementation of the API includes, but is not limited to, twointerfaces, one for calls from the electronic book reader application,and another for calls to the electronic book reader application.

Example calls that the electronic book reader can make to the dynamicaudio player include:

“ebookOpenedwithUniqueID”—This function is called by the electronic bookreader when the application opens an electronic book. This function hasparameters that specify the electronic book's unique identifier andwhether the electronic book has been opened before. In response to thisinformation the dynamic audio player sets the current cue. The firsttime an electronic book is opened, the current position will be set tothe start of the first cue.

“ebookClosed”—This function is called by the electronic book reader whenthe application closes an electronic book. In response to this call, thedynamic audio player can free up memory and reset internal data.

“ebookRemoved”—This function is called when the electronic book readerhas removed an ebook from its library, so that soundtrack and audiofiles also can also be removed.

“displayedPositionRangeChanged”—This function is called when theelectronic book reader changes its display, for example, due to a pageturn, orientation change, font change or the like, and providesparameters for the range of the work that is newly displayed. Inresponse to this call the dynamic audio player can set up audio cues forthe newly displayed range of the work.

“readingResumed”—This function is called when the user has resumedreading after an extended period of inactivity, which the electronicbook reader detects by receiving any of a variety of inputs from theuser (such as a page turn command) after reading has been determined tobe “paused.”

“fetchSoundtrack”—This function is called by the electronic book readerto instruct the dynamic audio player to fetch and import the soundtrackfile, or cue list, for the electronic book with a specified uniqueidentifier (provided as a parameter of this function).

“audioVolume”—This function is called by the electronic book reader toinstruct the dynamic audio player to set the volume of the audioplayback.

“getCueLists”—This function is called by the electronic book reader toretrieve information from the dynamic audio player about the cue listsand groups available for the currently opened electronic book. Thisfunction would allow the electronic book reader to present thisinformation to the reader, for example.

“cueListEnabled”—This function is called by the electronic book readerto instruct the dynamic audio player to enable or disable a particularcue list, e.g., an alternative soundtrack, sound effects, a recordedreader or text-to-speech conversion.

“audioIntensity”—This function is called by the electronic book readerto instruct the dynamic audio player to set the intensity of the audioplayback, e.g., to make the audio composition quieter or mute a drumstem (submix).

“audioPreloadDefault”—This function is called to set a default number ofhours of audio to download and keep on hand generally for electronicbooks.

“audioPreloadForEbook”—This function is called to set a number of hoursof audio to download and keep for a specific ebook.

“downloadEnabled”—This function is called to enable or disable audiodownloading.

Example calls that the dynamic audio player can make to the electronicbook reader include:

“readingPaused”—This function is called by the dynamic audio player ifit has not received a “displayedPositionRangeChanged” call from theelectronic book reader within an expected time. From this information,it is assumed by the dynamic audio player that the user is no longerreading. After calling this function, the electronic book reader shouldcall the “readingResumed” function when the user starts reading again.

“gotoPosition”—This function is called by the dynamic audio player toinstruct the electronic book reader to set the current position in thebook, usually at the start point of the first cue the first time theelectronic book is opened in response to the “ebookOpenedAtPath”function being called.

“wordCountForRange”—This function is called by the dynamic audio playerto instruct the electronic book reader to provide a number of words fora specified range of the electronic book, to be used in schedulingplaylists and tracking reading speed as described in more detail below.

The use of these API calls is described in more detail below.

The electronic book 110 has an associated cue list 118, described inmore detail below in connection with FIG. 3, which associates portionsof the text with audio cues 120. In general, an identifier used touniquely identify the electronic book 110 is used to associate the cuelist 118 to the book by either embedding the identifier in the cue listor having a form of lookup table or map that associates the identifierof the book with the cue list 118. An audio cue 120 is a computer datafile that includes audio data. In general, an audio cue 120 associatedwith a portion of the text by the cue list 118 is played back while thereader is reading that portion of the text. For example, a portion ofthe text may be designated by a point in the text around which the audiocue should start playing, or a range in the text during which the audiocue should play. The dynamic audio player 116 determines when and how tostop playing one audio cue and start playing another.

The dynamic audio player 116 receives data 114 about the userinteraction with the electronic book reader 112, as well as cues 120 andthe cue list 118. As will be described in more detail below, the dynamicaudio player 116 uses the user interaction data 114 and the cue list 118to select the audio cues 120 to be played, and when and how to playthem, to provide an output audio signal 122.

During playback of the soundtrack, the dynamic audio player plays acurrent cue, associated with the portion of the text currently beingread, and determines how and when to transition the next cue to beplayed, based on the data about the user interaction with the text. Asshown in more detail in FIG. 2, the dynamic audio player 200 thus uses acurrent cue 204 and a next cue 210 to generate audio 206. The cues 204and 210 to be played are determined through a cue lookup 208, using thedata 212 about the user interaction, and the cue list 202. While thedynamic audio player is playing the current cue 204, it monitors theincoming data 212 to determine when the next cue should be played. Thecurrent cue 204 may need to be played for a longer or shorter time thanthe cue's actual duration. As described in more detail below, thedynamic audio player lengthens or shortens the current cue so as to fitthe amount of time the user is taking to read the associated portion ofthe text, and then implements a transition, such as a cross fade, at theestimated time at which the user reaches the text associated with thenext cue.

Referring now to FIG. 3, an example implementation of the cue list 118of FIG. 1 will now be described in more detail. Audio cues, e.g., 120 inFIGS. 1 and 204, 210 in FIG. 2, are assigned to portions of the text.This assignment can be done using a meta-tag information file thatassociates portions of the text with audio files. The association withan audio file may be direct or indirect, and may be statically ordynamically defined. For example, different portions of the text can beassigned different words or other labels indicative of emotions, moodsor styles of music to be associated with those portions of the text.Audio files then can be associated with such words or labels. The audiofiles can be selected and statically associated with the text, or theycan be selected dynamically at the time of playback, as described inmore detail below. Alternatively, different points in the text may beassociated directly with an audio file.

An example meta-tag information file is shown in FIG. 3. The meta-taginformation file is a list 300 of pairs 302 of data representing a cue.Each pair 302 representing a cue includes a reference 304 to the text,such as a reference to a markup language element within a text document,an offset from the beginning of a text document, or a range within atext document. The pair 302 also includes data 306 that specifies thecue. This data may be a word or label, such as an emotive tag, or anindication of an audio file, such as a file name, or any other data thatmay be used to select an audio file. How a composer or a computerprogram can create such cue lists will be described in more detailbelow.

The meta-tag information file can be implemented as a file that is anarchive containing several metadata files. These files can be inJavaScript Object Notation (JSON) format. The meta-tag information filecan include a manifest file that contains general information about thesoundtrack, such as the unique identifier of the electronic book withwhich it is associated, the title of the electronic book, a schemaversion, (for compatibility purposed, in case the format changes in thefuture), and a list of other files in the archive, with checksums forintegrity checking. In addition to the manifest file, the meta-taginformation file also includes a cuelists file which contains the listof cue list descriptors available in the soundtrack. Each cue listdescriptor includes a display name, a unique identifier for lookuppurposes and an optional group name of the cue list. As an example,there may be several mutually exclusive main cue lists, from which itonly makes sense to have a single one playing. These cue lists mighthave a group name of “main,” whereas with a sound effects or “read tome” cue list it would be ok to play them all at that same time, and thuswould not utilize the group name.

The meta-tag information file also includes a cues file that containsthe list of cue descriptors for all of the cue lists. Each cuedescriptor includes a descriptive name given to the cue descriptor by aproducer. This descriptor could be entered using another application forthis purpose, and could include information such as a cue file name thatis used to look up the location of the cue file in the list of cuefiles, and in and out points in the electronic book.

Finally, the meta-tag information file includes a “cuefiles” file thatcontains the list of cue file descriptors. The cuefiles file specifiesthe network location of the cue files. Each cue file descriptor includesa descriptive name given to the cuefile by a producer and used as thecue file name in the cue descriptor, a uniform resource locator (URL)for retrieving the cue file and the original file name of the cue file.

The audio cues (120 in FIG. 1) referred to in such a cue list containaudio data, which may be stored in audio file formats, such as AIFF,MP3, AAC, m4a or other file types. Referring now to FIG. 4, an exampleimplementation of an audio cue file will be described. An audio cue file400 can include multiple “stems” (submixes) 402, each of which is aseparate audio file that provides one part of a multipart audio mix forthe cue. The use of such stems allows the dynamic audio player to selectfrom among the stems to repeat in order to lengthen the playback time ofthe cue. An audio cue file also can include information that is helpfulto the dynamic audio player to modify the duration for which the audiocue is played, such as loop markers 404, bar locations 406 andrecommended mix information 408. The recommended mix informationincludes a list of instructions for combining the audio stems, whereeach instruction indicates the stems and sections to be used, and anyaudio effects processing to be applied. Other information such as a wordor label indicative of the emotion or mood intended to be evoked by theaudio or data indicative of genre, style, instruments, emotion,atmosphere, place, era—called descriptors 410—also can be provided. Evenmore additional information, such as alternative keywords, cue volume,cross-fade or fade-in/out shape/intensity and recommended harmonicprogression for successive cues also can be included.

As an example, the audio cue file can be implemented as an archivecontaining a metadata file in JSON format and one or more audio filesfor stems of the cue. The metadata file contains a descriptor for themetadata associated with the audio files, which includes bar locations,loop markers, recommended mix information, emodes (emotional contentmeta-tags), audio dynamics control metadata (dynamic range compression),instruments, atmospheres and genres. The audio files can include datacompressed audio files and high resolution original audio files for eachstem. Retaining the high resolution versions of each stem supports laterediting using music production tools. A copy of the audio cue fileswithout the original audio files can be made to provide for smallerdownloads to electronic book readers. The cue file contains thecompressed audio files for the stems, which are the files used forplayback in the end user applications.

The cue files can be created using a software tool that inputs a set ofstandard audio stems, adds descriptor, loop point and recommended mixmeta information as a separate text file, optimizes and compresses theaudio for network delivery and outputs a single package file that can beuploaded to a database. An audio file can be analyzed using variousanalytic techniques to locate sections, beats, loudness information,fades, loop points and the link. Cues can be selected using thedescriptors “genre, style, instruments, emotion, place, era” anddelivered over the network as they are used by the reader.

The cue lists and cue files can be individually encrypted and linked toa specific work for which they are the soundtrack. The same key would beused to access the work and its soundtrack. Thus files could be tied tothe specific work or the specific viewing device through which the workwas accessed, and can use digital rights management informationassociated with the work.

Given the foregoing understanding of cue lists, the audio cues, and theinteraction available with the electronic book reader, the dynamic audioplayer will now be described in more detail in connection with FIGS.5-7.

To initiate playback when a book is first opened (500) by a reader, theelectronic book reader calls 502 the “ebookOpenedwithUniqueID” function,indicating the book's unique identifier and whether the book had beenopened before. The dynamic audio player receives 504 the identifier ofthe electronic book, and downloads or reads 506 the cue list for theidentified book. The electronic book reader prompts the dynamic audioplayer for information about the cue list, by calling 508 the“getCueLists” function. The dynamic audio player sends 510 the cue list,which the electronic book reader presents to the user to select 512 oneof the soundtracks (if there is more than one soundtrack) for the book.Such a selection could be enhanced by using a customer feedback ratingsystem that allows users to rate soundtracks, and these ratings could bedisplayed to users when a selection of a soundtrack is requested by thesystem. The “cueListEnabled” function is then called 514 to inform thedynamic audio player of the selected cue list, which the dynamic audioplayer receives 516 through the function call. The “fetchSoundtrack”function is called 518 to instruct the dynamic audio player to fetch 520the cues for playback.

After this setup process completes, the dynamic audio player has thestarting cue and the cue list, and thus the current cue, for initiatingplayback. Playback can be started at about the time this portion of theelectronic book is displayed by the electronic book reader. The dynamicplayer then determines, based on the data about the user interactionwith the book, the next cue to play, when to play the cue, and how totransition to the next cue from the current cue.

The dynamic audio player extends or shortens the playback time of acue's audio stem files to fit the estimated total cue duration. Thisestimated cue duration can be computed in several ways. An exampleimplementation uses an estimate of the reading speed, the computation ofwhich is described in more detail below. The current cue duration isupdated in response to the data that describes the user interaction withthe electronic book reader, such as provided at every page turn throughthe “displayedPositionRangeChanged” function call.

In general, the playback time of a cue's audio stem files is modified byautomatically looping sections of the audio stem files, varying theindividual stem mixes and dynamically adding various effects such asreverb, delays and chorus. The loop points and other mix automation dataspecific to the audio stem files are stored in the cue file's metadata.There can be several different loop points in a cue file. The sectionsof the audio stems can be selected so that, when looped and remixed,they provide the most effective and interesting musical end userexperience. This process avoids generating music that has obviousrepetitions and maximizes the musical content to deliver a musicallypleasing result that can have a duration many times that of the originalpiece(s) of audio. When the next cue is triggered, the transitionbetween the outgoing and the incoming audio is also managed by the sameprocess, using the cue file metadata to define the style and placementof an appropriate cross fade to create a seamless musical transition.

As an example, assume a cue file contains four audio stems (a melodytrack, a sustained chordal or “pad” track, a rhythmic percussive (oftendrums) track and a rhythmic harmonic track) that would run for 4 minutesif played in a single pass. Further assume that this recording has 3distinct sections, A, B and C. The meta information in the cue file willinclude:

1. how to transition into the cue from a previous cue. This includestransition style (i.e., slow, medium or quick fade-in, or stop previouscue with reverb tail and start new cue from beginning of cue), musicalbar and beat markers so that the cross fade will be musically seamless;

2. The time positions where each of the A, B and C sections can belooped.

3. The cue producer's input on how the 4 stems can be remixed. E.g.,play stems 1, 2 and 3 only using section A, then play stems 1, 3 and 4only using section A, add reverb to stem 3 and play it on its own usingsection B, then play stems 3 and 4 from section B, etc. Having thesekinds of instructions means that a typical four minute piece of audiocan be extended up to 40 or more minutes without obvious repetition. Inaddition, each mix is unique for the user and is created at the time ofplayback so unauthorized copying of the soundtrack is more difficult.

As an example, referring now to FIG. 6, this process will be describedin more detail. Given a cue and a starting point, the duration of timeuntil the next cue is to be played is determined (600). An example wayto compute this duration is provided in more detail below. Given theduration, the cue producer's input is processed to produce a playlist ofthe desired duration. In other words, the first instruction in the remixinformation is selected 602 and added to playlist. If this section ofthe audio stems has a duration less than the desired duration,determined at 604, then the next instruction is selected 606, and theprocess repeats until a playlist of the desired duration is completed608. At the end of the cue, the transition information in the metadatafor the next cue is used to select 610 a starting point in the currentplaylist to implement a cross-fade from the current cue to the next cue.

One way to estimate the duration of a cue is to estimate the readingspeed of the reader (in words per minute) and, given the number of wordsin the cue, determine how much time the reader is likely to take tocomplete reading this portion of the book. This estimate can be computedfrom a history of reading speed information for the reader.

When the user starts reading a book, an initial reading speed of acertain number words per minute is assumed. This initial speed can becalculated from a variety of data about a user's previous reading speedhistory from reading previous books, which can be organized by author,by genre, by time of day, by location, and across all books. If noprevious reading history is available, then an anonymous global tally ofhow other users have read this title can be used. If no other history isavailable a typical average of 400 words per minute is used.

Referring now to FIG. 7, the reading speed for the user is tracked eachtime the displayed position range is changed, as indicated by the“displayedPositionRangeChanged” function call. If this function call isreceived (700), then several conditions are checked 702. Theseconditions can include, but are not limited to nor are all required: theuser is actively reading, i.e., not in the reading paused state; the newdisplayed position range is greater than the previously displayedposition range; the start of the newly displayed position range touchesthe end of the previously displayed position range; and the word countis above a minimum amount (currently 150 words). The time since the lastchange also should be within a sensible range, such as the standarddeviation of the average reading speed to check the speed is within thenormal expected variance. If these conditions are met, then the currenttime is recorded 704. The time since the last change to the displayedposition range is computed and stored 706, together with the word countfor the previously displayed position range. The reading speed for thissection is computed 708. From this historic data of measured readingspeeds, an average reading speed can be computed and used to estimatecue durations.

The formula for calculating the reading speed S_(p) (in words persecond) for a page p is:

$S_{p} = \frac{W_{p}}{T_{p}}$

where W_(p) is the word count for the page and T_(p) is the time takento read the page, in seconds. In one implementation, the statistic usedfor the average reading speed is a 20 period exponential moving average(EMA), which smoothes out fluctuations in speed, while still consideringrecent page speeds more important.

The formula for calculating the EMA is:

M₀ = S₀$M_{p} = {{\frac{n - 1}{n + 1} \times M_{p - 1}} + {\frac{2}{n + 1} \times S_{p}}}$

Where n is the number of periods, i.e., 20.

To calculate the variance in reading speeds we use Welford's method forcalculating variance, over the last 20 values:

Initialize M₁=T₁ and S₁=0

For subsequent values of T, use the recurrence formulas

$M_{k} = {M_{k - 1} + \frac{T_{k} - M_{k - 1}}{k}}$S_(k) = S_(k − 1) + (T_(k) − M_(k − 1)) × (T_(k) − M_(k))

For ≦k≦n the k^(th) estimate of the variance is:

$s^{2} = {\frac{S_{k}}{k - 1}.}$

This reading speed information can be stored locally on the user'selectronic book reader application platform. Such information formultiple users can be compiled and stored on a server in an anonymousfashion. The application could look up reading speed informationstatistics to determine how fast others have read a work or portions ofa work.

Other types of user interaction instead of or in addition to readingspeed can be used to control playback.

In one implementation, the data about the user interaction with theelectronic book indicates that the reader has started reading from apoint within the book. This happens often, as a reader generally doesnot read a book from start to finish in one sitting. In some cases, whena reader restarts reading at a point within the book, the audio level,or other level of “excitement,” of the audio in the soundtrack at thatpoint might not be appropriate. That is, the audio could actually bedistracting at that point. The dynamic audio player can use anindication that the reader has started reading from a position withinthe book as an opportunity to select an alternative audio cue from theaudio cue that has been selected for the portion of the book thatincludes the current reading position.

As another example, the reader may be reading the book by skippingaround from section to section. Other multimedia works may encouragesuch a manner of reading. In such a case, the audio cue associated witha section of a work is played when display of that section is initiated.A brief cross-fade from the audio of the previously displayed section tothe audio for the newly displayed section can be performed. In someapplications, where the nature of the work is such that the viewing timeof any particular section is hard to predict, the dynamic playbackengine can simply presume that the duration is indefinite and it cancontinue to generate audio based on the instructions in the cue fileuntil an instruction is received to start another audio cue.

As another example, it is possible to use the audio cue files toplayback different sections of a cue file in response to user inputs.For example, popular songs could be divided into sections. A userinterface could be provided for controlling audio playback that wouldinstruct the player to jump to a next section or to a specified sectionin response to a user input.

Having now described how such works and accompanying soundtracks can becreated, their distribution will now be discussed.

Creating a soundtrack for an electronic book involves associating audiofiles with portions of the text of the electronic book. There areseveral ways in which the soundtrack can be created.

In one implementation, a composer writes and records original music foreach portion of the text. Each portion of the text can be associatedwith individual audio files that are so written and recorded.Alternatively, previously recorded music can be selected and associateddirectly with the portions of the text. In these implementations, theaudio file is statically and directly assigned to portions of the text.

In another implementation, audio files are indirectly assigned toportions of the text. Tags, such as words or other labels, areassociated with portions of the text. Such tags may be stored in acomputer data file or database and associated with the electronic book,similar to the cue list described above. Corresponding tags also areassociated with audio files. One or more composers write and recordoriginal music that is intended to evoke particular emotions or moods.Alternatively, previously recorded music can be selected. These audiofiles also are associated with such tags, and can be stored in adatabase. The tags associated with the portions of the text can be usedto automatically select corresponding audio files with the same tags. Inthe event that multiple audio files are identified for a tag in thebook, one of the audio files can be selected either by a computer orthrough human intervention. This implementation allows audio files to becollected in a database, and the creation of a soundtrack to becompleted semi-automatically, by automating the process of selectingaudio files given the tags associated with the electronic book and withaudio files.

In an implementation where audio files are indirectly associated withthe electronic book, the audio files also can be dynamically selectedusing the tags at a time closer to playback.

The process of associating tags with the electronic book also can beautomated. In particular, the text can be processed by a computer toassociate emotional descriptors to portions of the text based on asemantic analysis of the words of the text. Example techniques for suchsemantic analysis include, but are not limited to, those described in“Emotions from text: machine learning for text-based emotionprediction,” by Cecilia Ovesdotter Alm et al., in Proceedings of HumanLanguage Technology Conference and Conference on Empirical Methods inNatural Language Processing (October 2005), pp. 579-586, and which ishereby incorporated by reference. These tags can describe the emotionalfeeling or other sentiment that supports the section of the work beingviewed. For example these emotional feelings can include, but are notlimited to, medium tension, love interest, tension, jaunty, macho, dark,brooding, ghostly, happy, sad, wistful, sexy moments, bright and sunny.

FIG. 8 is a data flow diagram that illustrates an example of a fullyautomated process for creating a soundtrack for an electronic book,given audio files that have tags associated with them. An electronicbook 800 is input to an emotional descriptor generator 802 that outputsthe emotional descriptors and text ranges 804 for the book. Theemotional descriptors are used to lookup, in an audio database 806,audio files 810 that match the emotional descriptors for each range inthe book. The audio selector 808 allows for automated, random orsemi-automated selection of an audio file for each text range togenerate a cue list 812. A unique identifier can be generated for theelectronic book and stored with the cue list 812.

Such electronic books and their soundtracks can be distributed in any ofvariety of ways, including but not limited to currently used ways forcommercial distribution of electronic books. In one implementation, theelectronic book and the electronic book reader are distributed to endusers using conventional techniques. The distribution of the additionalsoundtrack and dynamic audio player is completed separately. Thedistribution of the soundtrack is generally completed in two steps:first the cue list is downloaded, and then each audio file isdownloaded. The audio files can be downloaded on demand. The dynamicaudio player can include a file manager that maintains information aboutavailable cue files that may be stored on the same device on which theelectronic book reader operates, or that may be stored remotely.

In one implementation, the electronic book is distributed to end usersalong with the cue list and dynamic audio player.

In another implementation, the electronic book and its associated cuelist are distributed together. The cue list is then used to download theaudio files for the soundtrack as a background task. In oneimplementation, the electronic book is downloaded first and the downloadof the cue list is initiated as a background task, and then the firstaudio file for the first cue is immediately downloaded.

In another implementation, the electronic book reader is a device withlocal storage that includes local generic cues, having a variety ofemotional descriptors that can be selected for a playback in accordancewith the cue list. These generic cues would allow playback of audio if aremote audio file became unavailable.

In one implementation, the electronic book reader application is loadedon a platform that has access to a network, such as the Internet,through which it can communicate with a distributor of electronic media.Such a distributor may receive a request to purchase and/or downloadelectronic media from users. After receiving the request, thedistributor may retrieve the requested work and its accompanyingsoundtrack information from a database. The retrieved electronic mediacan be encrypted and sent to the user of the electronic book readerapplication. The electronic media may be encrypted such that theelectronic media may be played only on a single electronic book reader.Typically, the digital rights management information associated with thework also is applied to the soundtrack information.

In this description, specific details are given to provide a thoroughunderstanding of the embodiments. However, it will be understood by oneof ordinary skill in the art that the embodiments may be practicedwithout these specific details. For example, software modules,functions, circuits, etc., may be shown in block diagrams in order notto obscure the embodiments in unnecessary detail. In other instances,well-known modules, structures and techniques may not be shown in detailin order not to obscure the embodiments.

Also, it is noted that the embodiments may be described as a processthat is depicted as a flowchart, a flow diagram, a structure diagram, ora block diagram. Although a flowchart may describe the operations as asequential process, many of the operations can be performed in parallelor concurrently. In addition, the order of the operations may bere-arranged. A process is terminated when its operations are completed.A process may correspond to a method, a function, a procedure, asubroutine, a subprogram, etc., in a computer program. When a processcorresponds to a function, its termination corresponds to a return ofthe function to the calling function or a main function.

Aspects of the systems and methods described herein may be operable onany type of general purpose computer system or computing device,including, but not limited to, a desktop, laptop, notebook, tablet ormobile device. The term “mobile device” includes, but is not limited to,a wireless device, a mobile phone, a mobile communication device, a usercommunication device, personal digital assistant, mobile hand-heldcomputer, a laptop computer, an electronic book reader and readingdevices capable of reading electronic contents and/or other types ofmobile devices typically carried by individuals and/or having some formof communication capabilities (e.g., wireless, infrared, short-rangeradio, etc.).

FIG. 9 is a block diagram illustrating the internal functionalarchitecture of a computer system 900 usable with one or more aspects ofthe systems and methods described in further detail below. As shown inFIG. 9, the computer system 900 may include a central processing unit(CPU) 914 for executing computer-executable process steps and interfaceswith a computer bus 916. Also shown in FIG. 9 are a network interface918, a display device interface 920, a keyboard or input interface 922,a pointing device interface 924, an audio interface 926, a videointerface 932, and a hard disk drive 934 or other persistent storage.

As described above, the disk 934 may store operating system programfiles, application program files, web browsers, and other files. Some ofthese files may be stored on the disk 934 using an installation program.For example, the CPU 914 may execute computer-executable process stepsof an installation program so that the CPU 914 can properly execute theapplication program.

A random access main memory (“RAM”) 936 may also interface to thecomputer bus 916 to provide the CPU 914 with access to memory storage.When executing stored computer-executable process steps from the disk934, the CPU 914 stores and executes the process steps out of the RAM936. Data to be processed also can be read from such memory 936 orstorage 934, and stored in such memory 936 or storage 934. Read onlymemory (“ROM”) 938 may be provided to store invariant instructionsequences such as start-up instruction sequences or basic input/outputoperating system (BIOS) sequences for operation of the keyboard 922.

An electronic book reader, or other application for providing visualdisplays of electronic books and other multimedia works, can beimplemented on a platform such as described in FIG. 9.

In the following description an electronic book and an electronic bookreader are used as examples of the kind of multimedia work andcorresponding viewer with which playback of a soundtrack can besynchronized. Other kinds of multimedia works in which the duration ofthe visual display of a portion of the work is dependent on userinteraction with the work also can use this kind of synchronization. Theterm electronic book is intended to encompass books, magazines,newsletters, newspapers, periodicals, maps, articles, and other worksthat are primarily text or text with accompanying graphics or othervisual media

In the foregoing, a storage medium may represent one or more devices forstoring data, including read-only memory (ROM), random access memory(RAM), magnetic disk storage mediums, optical storage mediums, flashmemory devices and/or other machine readable mediums for storinginformation. The terms “machine readable medium” and “computer readablemedium” include, but are not limited to portable or fixed storagedevices, optical storage devices, and/or various other mediums capableof storing, containing or carrying instruction(s) and/or data.

Furthermore, embodiments may be implemented by hardware, software,firmware, middleware, microcode, or any combination thereof. Whenimplemented in software, firmware, middleware or microcode, the programcode or code segments to perform the necessary tasks may be stored in amachine-readable medium such as a storage medium or other storage(s). Aprocessor may perform the necessary tasks. A code segment may representa procedure, a function, a subprogram, a program, a routine, asubroutine, a module, a software package, a class, or any combination ofinstructions, data structures, or program statements. A code segment maybe coupled to another code segment or a hardware circuit by passingand/or receiving information, data, arguments, parameters, or memorycontents. Information, arguments, parameters, data, etc. may be passed,forwarded, or transmitted via any suitable means including memorysharing, message passing, token passing, network transmission, etc.

The various illustrative logical blocks, modules, circuits, elements,and/or components described in connection with the examples disclosedherein may be implemented or performed with a general purpose processor,a digital signal processor (DSP), an application specific integratedcircuit (ASIC), a field programmable gate array (FPGA) or otherprogrammable logic component, discrete gate or transistor logic,discrete hardware components, or any combination thereof designed toperform the functions described herein. A general purpose processor maybe a microprocessor, but in the alternative, the processor may be anyconventional processor, controller, microcontroller, circuit, and/orstate machine. A processor may also be implemented as a combination ofcomputing components, e.g., a combination of a DSP and a microprocessor,a number of microprocessors, one or more microprocessors in conjunctionwith a DSP core, or any other such configuration.

The methods or algorithms described in connection with the examplesdisclosed herein may be embodied directly in hardware, in a softwaremodule executable by a processor, or in a combination of both, in theform of processing unit, programming instructions, or other directions,and may be contained in a single device or distributed across multipledevices. A software module may reside in RAM memory, flash memory, ROMmemory, EPROM memory, EEPROM memory, registers, hard disk, a removabledisk, a CD-ROM, or any other form of storage medium known in the art. Astorage medium may be coupled to the processor such that the processorcan read information from, and write information to, the storage medium.In the alternative, the storage medium may be integral to the processor.

One or more of the components and functions illustrated by the figuresmay be rearranged and/or combined into a single component or embodied inseveral components without departing from the invention. Additionalelements or components may also be added without departing from theinvention. Additionally, the features described herein may beimplemented in software, hardware, as a business method, and/orcombination thereof.

While certain exemplary embodiments have been described and shown in theaccompanying drawings, it is to be understood that such embodiments aremerely illustrative of and not restrictive on the broad invention,having been presented by way of example only, and that this invention isnot be limited to the specific constructions and arrangements shown anddescribed, since various other modifications may occur to thoseordinarily skilled in the art.

What is claimed is:
 1. A computer-implemented process for dynamicplayback of audio, comprising receiving data about user interaction witha portion of an electronic visual work; and dynamically adjusting aduration of playback of audio associated with the portion of theelectronic visual work according to the user interaction.
 2. Thecomputer-implemented process of claim 1, further comprising; estimatinga duration for visual display of the portion of the electronic visualwork according to the received data about user interaction with theportion of the electronic visual work; and selecting a sequence of mixesof stems of audio associated with the portion of the electronic visualwork so as to provide audio with the estimated duration.
 3. Thecomputer-implemented process of claim 2, wherein the duration isestimated using a history of reading speeds.
 4. A computer-implementedprocess for playing a soundtrack in synchronization with display of anelectronic visual work, comprising: receiving into memory the electronicvisual work; receiving into memory information associating portions ofthe electronic visual work with tags; displaying portions of theelectronic visual work in response to user interaction; accessing audiofiles with tags; selecting, using the processor, audio files to beassociated with portions of the electronic visual work according to thetags associated with the portions of the electronic visual work.receiving data about user interaction with the portion of an electronicvisual work; and dynamically adjusting a duration of playback of audioassociated with the portion of the electronic visual work according tothe user interaction.
 5. A computer-implemented process for generating asoundtrack for an electronic visual work, comprising: receiving theelectronic visual work into memory; processing, by a processor, theelectronic visual work in the memory, to mark portions of the electronicvisual work by associating, in memory, tags with portions of theelectronic visual work; accessing audio files with tags; selecting,using the processor, audio files for portions of the electronic visualwork according to the tags associated with the portions of theelectronic visual work.
 6. The computer-implemented process of claim 5,wherein the electronic visual work includes text and the processingincludes processing the text.
 7. The computer-implemented process ofclaim 6, wherein the tags include emotional descriptors.
 8. A digitalinformation product, comprising: a computer readable medium; computerreadable data stored on the computer readable medium that, whenprocessed by a computer, is interpreted by the computer to define acomputer readable file including a cue list, including, for each portionof an electronic visual work, an emotional descriptor, wherein theemotional descriptors correspond to emotional descriptors alsoassociated with audio data.
 9. A digital information product,comprising: a computer readable medium; computer readable data stored onthe computer readable medium that, when processed by a computer, isinterpreted by the computer to define a computer readable file includingdata defining an audio cue, including audio data for a plurality ofstems that can be mixed to provide audio data and information indicativeof how the stems can be repeated and combined.